feat(evaluation): Add CJK tokenizer support for ROUGE-1 evaluation #4143

maru0804 · 2026-01-13T13:33:40Z

Summary

The default ROUGE tokenizer only recognizes ASCII alphanumeric characters ([a-z0-9]), causing ROUGE-1 scores to be 0.0 for CJK (Chinese, Japanese, Korean) text. This PR adds CJK language support through an opt-in tokenizer.

Changes

New Features

CJKTokenizer: A character-based tokenizer for CJK languages that:
- Tokenizes CJK characters individually (1 char = 1 token)
- Preserves word-based tokenization for ASCII alphanumeric
- Removes CJK punctuation (U+3000-U+303F)
- Skips other scripts (Greek, Cyrillic, fullwidth alphanumeric, etc.)
RougeScoreCriterion: New criterion class to specify tokenizer options

Modified

RougeEvaluator: Updated to support custom tokenizers and log a warning (once) when CJK text is detected without proper tokenizer configuration
ResponseEvaluator: Updated to pass eval_metric (including criterion) to RougeEvaluator

Usage

from google.adk.evaluation.eval_metrics import EvalMetric, RougeScoreCriterion

criterion = RougeScoreCriterion(threshold=0.8, tokenizer='cjk')
eval_metric = EvalMetric(
    metric_name='response_match_score',
    threshold=0.8,
    criterion=criterion,
)

Backward Compatibility

✅ Default behavior unchanged (ASCII-only tokenization)
✅ Existing tests pass (365 evaluation tests)
✅ Warning logged when CJK detected without tokenizer

Limitations (documented in docstrings)

Fullwidth alphanumeric (Ａ-Ｚ, ０-９) are skipped
Greek, Cyrillic, and other non-CJK scripts are skipped
Character-based tokenization, not morphological analysis (for Japanese morphological analysis, consider MeCab)

Test Coverage

Added 29 new tests covering:

CJKTokenizer tokenization behavior
ROUGE score calculation with/without CJK tokenizer
Warning behavior (logged once per instance)
Edge cases (empty strings, None, mixed text, punctuation)

Fixes google#4122 The default ROUGE tokenizer only recognizes ASCII alphanumeric characters, causing ROUGE-1 scores to be 0.0 for CJK (Chinese, Japanese, Korean) text. Changes: - Add CJKTokenizer class that handles CJK characters individually while preserving word-based tokenization for ASCII alphanumeric characters - Add RougeScoreCriterion to allow explicit tokenizer specification - Update RougeEvaluator to support custom tokenizers - Add warning when CJK text is detected without CJK tokenizer - Preserve backward compatibility: default behavior unchanged Usage: criterion = RougeScoreCriterion(threshold=0.8, tokenizer='cjk') eval_metric = EvalMetric( metric_name='response_match_score', threshold=0.8, criterion=criterion, ) Limitations: - Fullwidth alphanumeric (Ａ-Ｚ, ０-９) are skipped - Greek, Cyrillic, and other non-CJK scripts are skipped - This is character-based tokenization, not morphological analysis

gemini-code-assist · 2026-01-13T13:34:30Z

Summary of Changes

Hello @maru0804, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the ROUGE-1 evaluation framework by introducing robust support for CJK (Chinese, Japanese, Korean) languages. Previously, the default ROUGE tokenizer's ASCII-only nature led to inaccurate, often zero, scores for CJK text. The changes address this by providing an opt-in CJK-specific tokenizer and a mechanism to configure it, ensuring that evaluations involving multilingual content yield meaningful results. The update also includes a helpful warning system to alert users when CJK text is present but no appropriate tokenizer is configured, promoting correct usage.

Highlights

CJK Tokenizer Introduction: A new CJKTokenizer is added, specifically designed for CJK (Chinese, Japanese, Korean) languages, which tokenizes CJK characters individually, preserves word-based tokenization for ASCII alphanumeric text, and removes CJK punctuation.
Configurable ROUGE Evaluation: A RougeScoreCriterion class is introduced, allowing users to specify custom tokenizer options for ROUGE-1 evaluation, enabling proper scoring for CJK text.
Enhanced ROUGE Evaluator: The RougeEvaluator is updated to support these custom tokenizers and now logs a warning (once per instance) if CJK text is detected without a suitable tokenizer configured, guiding users to better evaluation practices.
Improved Metric Propagation: The ResponseEvaluator is modified to correctly pass the eval_metric (including the new RougeScoreCriterion) to the RougeEvaluator, ensuring that tokenizer configurations are applied.
Comprehensive Testing: 29 new tests have been added to validate the CJKTokenizer's behavior, ROUGE score calculations with and without the CJK tokenizer, and the warning mechanism, ensuring robustness and correctness.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request does an excellent job of adding CJK language support for ROUGE-1 evaluation, addressing a significant limitation of the default tokenizer. The introduction of CJKTokenizer and RougeScoreCriterion is well-designed, and the changes are thoughtfully integrated to ensure backward compatibility. The addition of comprehensive unit tests covering the new tokenizer, scoring logic, and warning behavior is particularly commendable. My review includes a couple of suggestions to improve performance and code conciseness, but overall, this is a high-quality contribution.

src/google/adk/evaluation/final_response_match_v1.py

…instantiation Address Gemini Code Assist review feedback: 1. Simplify CJKTokenizer.tokenize() using re.findall instead of manual loop - More concise and idiomatic Python - Same functionality with fewer lines of code 2. Optimize RougeScorer instantiation for performance - Create RougeScorer once in __init__ instead of per invocation - Reuse self._scorer across all evaluate_invocations calls - Avoids unnecessary object creation in loops

ryanaiagent · 2026-01-15T20:06:02Z

Hi @maru0804, Thank you for your contribution! We appreciate you taking the time to submit this pull request. Your PR has been received by the team and is currently under review. We will provide feedback as soon as we have an update to share.

ryanaiagent · 2026-01-15T20:06:16Z

Hi @ankursharmas , can you please review this.

ryanaiagent · 2026-01-26T19:25:25Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for CJK languages in ROUGE-1 evaluation by adding an optional CJK-aware tokenizer. The changes are well-structured, with a new RougeScoreCriterion for configuration, a CJKTokenizer implementation, and updates to RougeEvaluator to use it. The refactoring of RougeEvaluator to initialize the scorer once is a good performance improvement. The new functionality is backed by a comprehensive set of unit tests covering tokenization, score calculation, and the new warning mechanism for misconfigurations. My review includes a couple of suggestions to improve code clarity and test robustness. Overall, this is a high-quality contribution.

gemini-code-assist · 2026-01-26T19:27:12Z

src/google/adk/evaluation/final_response_match_v1.py

+    tokenizer: Optional[tokenizers.Tokenizer] = None
+    use_stemmer = True
+
+    if eval_metric.criterion:
+      try:
+        criterion = RougeScoreCriterion.model_validate(
+            eval_metric.criterion.model_dump()
+        )
+        if criterion.tokenizer == "cjk":
+          tokenizer = CJKTokenizer()
+          use_stemmer = False  # Stemming not applicable to CJK
+      except ValidationError:
+        pass  # Different criterion type, ignore
+
+    # Create scorer once for reuse across invocations (performance optimization)
+    if tokenizer:
+      self._scorer = rouge_scorer.RougeScorer(
+          ["rouge1"], use_stemmer=False, tokenizer=tokenizer
+      )
+      self._has_cjk_tokenizer = True
+    else:
+      self._scorer = rouge_scorer.RougeScorer(
+          ["rouge1"], use_stemmer=use_stemmer
+      )
+      self._has_cjk_tokenizer = False


The logic for determining whether to use a stemmer can be simplified. The use_stemmer variable is only ever True when it's used to initialize RougeScorer in the else branch. The assignment use_stemmer = False is redundant because when a custom tokenizer is provided, use_stemmer is hardcoded to False during scorer creation anyway.

You can simplify this block to improve clarity and remove the redundant variable and its associated logic.

tokenizer: Optional[tokenizers.Tokenizer] = None if eval_metric.criterion: try: criterion = RougeScoreCriterion.model_validate( eval_metric.criterion.model_dump() ) if criterion.tokenizer == "cjk": tokenizer = CJKTokenizer() except ValidationError: pass # Different criterion type, ignore # Create scorer once for reuse across invocations (performance optimization) if tokenizer: self._scorer = rouge_scorer.RougeScorer( ["rouge1"], use_stemmer=False, tokenizer=tokenizer ) self._has_cjk_tokenizer = True else: self._scorer = rouge_scorer.RougeScorer( ["rouge1"], use_stemmer=True ) self._has_cjk_tokenizer = False

gemini-code-assist · 2026-01-26T19:27:12Z

tests/unittests/evaluation/test_final_response_match_v1.py

+  def test_tokenize_korean(self):
+    tokenizer = CJKTokenizer()
+    tokens = tokenizer.tokenize("테스트")
+    assert len(tokens) == 3  # 3 Hangul syllables


This assertion is correct, but it could be more specific to prevent potential regressions. Checking only the length of the token list might not catch issues where the tokenization is incorrect but produces the right number of tokens (e.g., ['테', '스', 'x']). It would be more robust to assert the exact content of the token list.

Suggested change

assert len(tokens) == 3 # 3 Hangul syllables

assert tokens == ["테", "스", "트"]

adk-bot added the eval [Component] This issue is related to evaluation label Jan 13, 2026

gemini-code-assist bot reviewed Jan 13, 2026

View reviewed changes

src/google/adk/evaluation/final_response_match_v1.py Outdated Show resolved Hide resolved

src/google/adk/evaluation/final_response_match_v1.py Show resolved Hide resolved

maru0804 mentioned this pull request Jan 13, 2026

response_match_score (ROUGE-1) is not effectively in Japanese (with manual tokenization) #4122

Open

ftnext mentioned this pull request Jan 14, 2026

Eval fails for non-English languages #3111

Open

ryanaiagent self-assigned this Jan 15, 2026

ryanaiagent added the needs review [Status] The PR/issue is awaiting review from the maintainer label Jan 15, 2026

Merge branch 'main' into fix/4122-cjk-rouge-tokenizer

73e2fad

gemini-code-assist bot reviewed Jan 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(evaluation): Add CJK tokenizer support for ROUGE-1 evaluation #4143

feat(evaluation): Add CJK tokenizer support for ROUGE-1 evaluation #4143

maru0804 commented Jan 13, 2026

Uh oh!

gemini-code-assist bot commented Jan 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

ryanaiagent commented Jan 15, 2026

Uh oh!

ryanaiagent commented Jan 15, 2026

Uh oh!

ryanaiagent commented Jan 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	assert len(tokens) == 3 # 3 Hangul syllables
	assert tokens == ["테", "스", "트"]

feat(evaluation): Add CJK tokenizer support for ROUGE-1 evaluation #4143

Are you sure you want to change the base?

feat(evaluation): Add CJK tokenizer support for ROUGE-1 evaluation #4143

Conversation

maru0804 commented Jan 13, 2026

Summary

Changes

New Features

Modified

Usage

Backward Compatibility

Limitations (documented in docstrings)

Test Coverage

Uh oh!

gemini-code-assist bot commented Jan 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

ryanaiagent commented Jan 15, 2026

Uh oh!

ryanaiagent commented Jan 15, 2026

Uh oh!

ryanaiagent commented Jan 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants